Global term weights in distributed environments
نویسنده
چکیده
This paper examines the estimation of global term weights (such as IDF) in information retrieval scenarios where a global view on the collection is not available. In particular, the two options of either sampling documents or of using a reference corpus independent of the retrieval collection are compared using standard IR test collections. In addition, the possibility of pruning term lists based on frequency is evaluated. The results show that very good retrieval performance can be reached when just the most frequent terms of a collection – an “extended stop word list” – are known and all terms which are not in that list are treated equally. However, the list cannot always be fully estimated from a general-purpose reference corpus, but some “domain-specific stop words” need to be added. A good solution for achieving this is to mix estimates from small samples of the retrieval collection with ones derived from a reference corpus.
منابع مشابه
GLOBAL ROBUST STABILITY CRITERIA FOR T-S FUZZY SYSTEMS WITH DISTRIBUTED DELAYS AND TIME DELAY IN THE LEAKAGE TERM
The paper is concerned with robust stability criteria for Takagi- Sugeno (T-S) fuzzy systems with distributed delays and time delay in the leakage term. By exploiting a model transformation, the system is converted to one of the neutral delay system. Global robust stability result is proposed by a new Lyapunov-Krasovskii functional which takes into account the range of delay and by making use o...
متن کاملEstimation of global term weights for distributed and ubiquitous IR
This paper reports on information retrieval experiments aimed at application in ubiquitous or P2P environments. The main question to be investigated is whether global term statistics such as IDF – which normally require a global view on the document collection – can be replaced with estimates obtained from a representative (i.e. well-balanced) reference corpus without losing too much effectiven...
متن کاملSpike Timing Regulation on the Millisecond Scale by Distributed Synaptic Plasticity at the Cerebellum Input Stage: A Simulation Study
The way long-term synaptic plasticity regulates neuronal spike patterns is not completely understood. This issue is especially relevant for the cerebellum, which is endowed with several forms of long-term synaptic plasticity and has been predicted to operate as a timing and a learning machine. Here we have used a computational model to simulate the impact of multiple distributed synaptic weight...
متن کاملHigher moments portfolio Optimization with unequal weights based on Generalized Capital Asset pricing model with independent and identically asymmetric Power Distribution
The main criterion in investment decisions is to maximize the investors utility. Traditional capital asset pricing models cannot be used when asset returns do not follow a normal distribution. For this reason, we use capital asset pricing model with independent and identically asymmetric power distributed (CAPM-IIAPD) and capital asset pricing model with asymmetric independent and identically a...
متن کاملA context-sensitive dynamic role-based access control model for pervasive computing environments
Resources and services are accessible in pervasive computing environments from anywhere and at any time. Also, due to ever-changing nature of such environments, the identity of users is unknown. However, users must be able to access the required resources based on their contexts. These and other similar complexities necessitate dynamic and context-aware access control models for such environmen...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Inf. Process. Manage.
دوره 44 شماره
صفحات -
تاریخ انتشار 2008